The purpose of this document is to illustrate time series analysis and forecasting. We will use a simulated dataset to analyze things like visits, discharges and payments. To perform these analyses we will be following the modeltime workflow. This report will be broken down into sections that follow that same workflow.
Lets take a look at our data and see what it has.
df_tbl %>%
glimpse()## Rows: 160,649
## Columns: 12
## $ mrn <chr> "83537791", "66727914", "84487881", "19795240"~
## $ visit_id <chr> "8762718156", "1283065398", "1171004549", "133~
## $ visit_start_date_time <dttm> 2012-01-01 06:14:00, 2011-12-26 01:14:00, 201~
## $ visit_end_date_time <dttm> 2012-01-01 00:00:00, 2012-01-01 12:06:00, 201~
## $ total_charge_amount <dbl> 4379.00, 62580.61, 38466.48, 16561.16, 16725.3~
## $ total_adjustment_amount <dbl> -4071.41, -39117.58, -26930.67, -11211.26, -11~
## $ total_payment_amount <dbl> -307.59, -23463.03, -11535.81, -5349.90, -5273~
## $ payer_grouping <chr> "Medicaid", "Commercial", "Blue Cross", "Medic~
## $ service_line <chr> "General Outpatient", "Medical", "Surgical", "~
## $ ip_op_flag <chr> "O", "I", "I", "I", "I", "I", "I", "I", "I", "~
## $ adm_date <date> 2012-01-01, 2011-12-26, 2011-12-31, 2011-12-2~
## $ dsch_date <date> 2012-01-01, 2012-01-01, 2012-01-01, 2012-01-0~
skim(df_tbl)| Name | df_tbl |
| Number of rows | 160649 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| Date | 2 |
| numeric | 3 |
| POSIXct | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| mrn | 0 | 1 | 8 | 8 | 0 | 75614 | 0 |
| visit_id | 0 | 1 | 10 | 10 | 0 | 160637 | 0 |
| payer_grouping | 0 | 1 | 1 | 14 | 0 | 13 | 0 |
| service_line | 0 | 1 | 2 | 44 | 0 | 29 | 0 |
| ip_op_flag | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| adm_date | 0 | 1 | 2010-11-26 | 2019-12-31 | 2015-08-24 | 2956 |
| dsch_date | 0 | 1 | 2012-01-01 | 2019-12-31 | 2015-08-28 | 2922 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| total_charge_amount | 0 | 1.00 | 46082.61 | 63709.76 | 0.13 | 12960.31 | 29371.79 | 55058.56 | 3686443.50 | ▇▁▁▁▁ |
| total_adjustment_amount | 0 | 1.00 | -39892.10 | 59053.62 | -3059344.26 | -48460.49 | -23357.11 | -9586.85 | 87590.13 | ▁▁▁▁▇ |
| total_payment_amount | 3299 | 0.98 | -8965.33 | 12691.35 | -627099.24 | -10904.82 | -6125.13 | -2779.48 | 1629.69 | ▁▁▁▁▇ |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| visit_start_date_time | 0 | 1 | 2010-11-26 05:00:00 | 2019-12-31 05:00:00 | 2015-08-24 10:02:00 | 118060 |
| visit_end_date_time | 0 | 1 | 2012-01-01 00:00:00 | 2019-12-31 22:58:00 | 2015-08-28 00:00:00 | 103820 |
Our objectives are to:
Our forecasting will focus on a grouped forecast where we are going to forecast the number of discharges by inpatient/outpatient visit type and by payer grouping.
We are going to do this on a weekly scale.
df_tblsummarise_by_time() with .by = "month", and n() the visits.transactions_monthly_tbltransactions_monthly_tbl <- df_tbl %>%
filter(payer_grouping != "?") %>%
group_by(ip_op_flag, payer_grouping) %>%
summarise_by_time(
.date_var = dsch_date
, .by = "month"
, value = n()
) %>%
ungroup()
transactions_monthly_tbl## # A tibble: 2,137 x 4
## ip_op_flag payer_grouping dsch_date value
## <chr> <chr> <date> <int>
## 1 I Blue Cross 2012-01-01 153
## 2 I Blue Cross 2012-02-01 156
## 3 I Blue Cross 2012-03-01 145
## 4 I Blue Cross 2012-04-01 178
## 5 I Blue Cross 2012-05-01 188
## 6 I Blue Cross 2012-06-01 155
## 7 I Blue Cross 2012-07-01 175
## 8 I Blue Cross 2012-08-01 183
## 9 I Blue Cross 2012-09-01 145
## 10 I Blue Cross 2012-10-01 133
## # ... with 2,127 more rows
transactions_monthly_tbl %>%
plot_time_series(
.date_var = dsch_date
, .color_var = ip_op_flag
, .facet_vars = payer_grouping
, .facet_ncol = 4
, .value = value
, .smooth = FALSE
)